In the last tutorial, we acted as data engineers, collected the data, and transformed the data into a more convenient form so that the data analyst and scientist can use it. Now, the modified data is delivered to a data analyst.
The job of a data analyst is to transform data into information- any insight that helps in achieving the higher goal, which in our case is STLF (Short Term Load Forecasting). The useful information extracted will help the data scientist to create robust models. So another thing necessary for a data analyst is to communicate the results in the most effective ways.
As you might have already observed that this tutorial is neither a word document nor a PDF. The file type of this document is HTML. All the webpages that you browse on your internet browser are HTML files. This file should also be opened in your internet browser. I created this report in the same format. In fact, I wrote all the tutorials in the same format. This provides us with the best data visualization tools available. Rstudio calls such reports the ‘R Notebook.’ R Notebooks can be converted to PDFs, word documents, and even PowerPoint slides. It is a beautiful way of creating data analysis reports. You can also add videos to your report. Below is a video tutorial to get started with R Notebooks.
You can also download the code to create this report from the button on the top right labeled as ‘Code.’
In the last tutorial, you created a script to read and process data to be ready for further analysis. With this tutorial, you will find “Houses.csv” which contains the total usage of the nine houses specified in the metadata file. We will analyze this data in this tutorial.
Houses = read.csv("Houses.csv")
Houses$Date_Time = as.POSIXct(Houses$Date_Time)
dim(Houses)
## [1] 8760 10
The houses.csv has ten columns and 8760 rows.
str(Houses)
## 'data.frame': 8760 obs. of 10 variables:
## $ Date_Time: POSIXct, format: "2018-06-01 00:00:00" "2018-06-01 01:00:00" ...
## $ House14 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ House15 : num 4.9363 0.0216 0.0216 0.0216 0.9376 ...
## $ House18 : num 1.404 1.383 0.815 0.974 1.318 ...
## $ House2 : num 0.805 0.814 0.776 0.613 0.595 ...
## $ House21 : num 0.816 1.399 2.59 2.402 2.364 ...
## $ House26 : num 3.57 3.15 3.16 3.26 2.88 ...
## $ House39 : num 2.23 2.14 2.17 2.1 1.95 ...
## $ House4 : num 3.72 3.78 3.43 3.32 3.15 ...
## $ House9 : num 0.765 0.399 0.34 0.39 1.039 ...
The first column is Date_Time, which contains hourly date\time labels. The rest of the nine columns is the hourly electricity consumption of nine households. The unit of these columns is kW.
summary(Houses)
## Date_Time House14 House15
## Min. :2018-06-01 00:00:00 Min. :0.00000 Min. :0.000797
## 1st Qu.:2018-08-31 05:45:00 1st Qu.:0.07953 1st Qu.:0.722849
## Median :2018-11-30 11:30:00 Median :0.15190 Median :1.123162
## Mean :2018-11-30 11:30:00 Mean :0.28536 Mean :1.546613
## 3rd Qu.:2019-03-01 17:15:00 3rd Qu.:0.32116 3rd Qu.:2.024532
## Max. :2019-05-31 23:00:00 Max. :3.71810 Max. :9.153413
## House18 House2 House21 House26
## Min. :0.00002 Min. :0.0000 Min. :0.0001 Min. :0.0005
## 1st Qu.:0.28800 1st Qu.:0.1657 1st Qu.:0.3578 1st Qu.:0.5133
## Median :0.43756 Median :0.3245 Median :0.5542 Median :0.7940
## Mean :0.63221 Mean :0.4111 Mean :0.7428 Mean :1.0076
## 3rd Qu.:0.84921 3rd Qu.:0.5568 3rd Qu.:0.8100 3rd Qu.:1.1825
## Max. :2.53655 Max. :3.4504 Max. :4.5927 Max. :4.5039
## House39 House4 House9
## Min. :0.0001 Min. :0.00032 Min. :0.000032
## 1st Qu.:0.3840 1st Qu.:0.98107 1st Qu.:0.145839
## Median :0.6132 Median :1.39622 Median :0.259798
## Mean :0.6704 Mean :1.65321 Mean :0.482208
## 3rd Qu.:0.8969 3rd Qu.:2.10885 3rd Qu.:0.563267
## Max. :2.9461 Max. :7.00872 Max. :5.874098
Explain the summary above.
Another way to display the same information is a boxplot. See the interactive boxplots below (Hover your cursor over the plot).
library(plotly)
## Warning: package 'plotly' was built under R version 4.0.5
## Warning: package 'ggplot2' was built under R version 4.0.4
library(reshape)
## Warning: package 'reshape' was built under R version 4.0.5
houses_melt = melt(Houses[,-1])
ggplotly(
ggplot(houses_melt)+
geom_boxplot(aes(x = variable, y = value))+
labs(x="Houses", y = "Usage [kW]")
)
Explain the code above.
List down all the useful information that you can extract from these boxplots?
ggplotly(
ggplot(Houses, aes(x = Date_Time))+
geom_line(aes(y = House2, color = "House 2"))+
geom_line(aes(y = House21, color = "House 21"))+
geom_line(aes(y = House26, color = "House 26"))+
geom_line(aes(y = House39, color = "House 39"))+
geom_line(aes(y = House4, color = "House 4"))+
geom_line(aes(y = House9, color = "House 9"))+
geom_line(aes(y = House14, color = "House 14"))+
geom_line(aes(y = House15, color = "House 15"))+
geom_line(aes(y = House18, color = "House 18"))+
theme(legend.title = element_blank()) + labs(x = "Date\\Time", y = "USage [kW]")
)
The plot above shows all the data plotted. The y-axis shows the electricity load of each household, and the x-axis shows the Date\time. Click on the legend labels on the right to add or remove households from the plot. Double click on any house from the legend to isolate its plot.
Observe that for every house; electricity consumption is high for summer months and low for winter months. Electricity consumption in Pakistan is highly dependent on the weather.
Isolate House 2, what is strange about this house’s electricity consumption? What is a possible explanation?
Isolate House 14, what is strange about this house’s electricity consumption? What is a possible explanation?
One of the most interesting houses is House 15; it has the highest variation in electricity consumption. Let’s isolate House 15 for further analysis.
House_15 = Houses[,c(1,3)]
House_15$Month = paste(months(House_15$Date_Time, abbreviate = TRUE) ,
as.POSIXlt(House_15$Date_Time)$year+1900)
ggplotly(
ggplot(House_15)+
geom_boxplot(aes(x = Month, y = House15))+
labs(x="Houses", y = "Usage [kW]")+
theme(axis.text.x = element_text(angle = 45))+
scale_x_discrete(limits= c(paste(month.abb[6:12], "2018"),
paste(month.abb[1:5], "2019")))
)
The above box plot shows the energy consumption pattern of House 15 for each month.
Create a similar boxplot for each hour pattern for House 4. I have attached a snapshot of the plot for your guidance.
Hourly Boxplot
There is a clear pattern followed by House 4 every day, as shown in the hourly boxplot. The electricity consumption is high at night and low in the day. Most of the residents of the household might have a routine of leaving the house at 7 or 8 am and return home at 8 or 9 pm. This information can be used for STLF.
Is there any other information that you can deduct from the plots above.
I have a thesis to be further explored, to be beneficial for STLF:
Electricity Consumption of every hour depends on the electricity consumption of the previous hour
I deduced this statement from the Hourly Boxplot. As House 4 follows a daily pattern, the electricity consumption of an hour can be approximated from the electricity consumption of the same hour of the previous day.
Create an R markdown report, doing a similar analysis for all households.
Explain patterns that you observe in these household’s electricity consumption
Email me your detailed report.